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Abstract 

This paper considers a dynamic game with transferable utiHties (TU), where the characteristic 
function is a continuous-time bounded mean ergodic process. A central planner interacts continuously 
over time with the players by choosing the instantaneous allocations subject to budget constraints. 
Before the game starts, the central planner knows the nature of the process (bounded mean ergodic), 
the bounded set from which the coalitions' values are sampled, and the long run average coalitions' 
values. On the other hand, he has no knowledge of the underlying probability function generating 
the coalitions' values. Our goal is to find allocation rules that use a measure of the extra reward 
that a coalition has received up to the current time by re-distributing the budget among the players. 
The objective is two-fold: i) guaranteeing convergence of the average allocations to the core (or a 
specific point in the core) of the average game, ii) driving the coalitions' excesses to an a priori given 
cone. The resulting allocation rules are robust as they guarantee the aforementioned convergence 
properties despite the uncertain and time-varying nature of the coaltions' values. We highlight three 
main contributions. First, we design an allocation rule based on full observation of the extra reward 
so that the average allocation approaches a specific point in the core of the average game, while 
the coalitions' excesses converge to an a priori given direction. Second, we design a new allocation 
rule based on partial observation on the extra reward so that the average allocation converges to the 
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core of the average game, while the coalitions' excesses converge to an a priori given cone. And 
third, we establish connections to approachability theory ||9l, UBI and attainability theory ||4l, |fT9l . 

Keywords Coalitional games with transferable utilities; allocation processes; approacha- 
bility theory; Lyapunov stochastic stability. 

I. Introduction 

Coalitional games with transferable utilities (TU), introduced first by Von Neuman and 
Morgenstem [|25l . have recently sparked much interest in the control and communication 
engineering communities [llTI . In essence, coalitional TU games are comprised of a set of 
players who can form coalitions and a characteristic function associating a real number with 
every coalition. This real number represents the value of the coalition and can be thought of 
as a monetary value that can be distributed among the members of the coalition according to 
some appropriate fairness allocation rule. The value of a coalition also reflects the monetary 
benefit demanded by a coalition to be a part of the grand coalition. 

This paper considers a dynamic TU game, where the characteristic function is a bounded 
mean ergodic process. Bounded means that the characteristic function takes values in a convex 
set according to an unknown probability distribution. Mean ergodic means that the expected 
value of the coalitions values at each time coincides with the long term average. With the 
dynamic game we associate a dynamic average game obtained by averaging over time the 
coalitions' values, and assume that the core of the average game is nonempty on the long run. 
Given the above dynamic TU game, a central planner interacts continuously over time with 
the players by choosing the instantaneous allocations subject to budget constraints. Before 
the game starts, the central planner knows the nature of the process (bounded mean ergodic), 
the bounded set and the long run average coalitions' values. On the other hand, he has no 
knowledge of the underlying probability function generating the instantaneous coalitions' 
values. Our goal is to find allocation rules that use a measure of the extra reward that a 
coalition has received up to the current time by re-distributing the budget among the players. 
The objective is two-fold: i) guaranteeing convergence of the average allocations to the core 
(or a specific point in the core) of the average game, ii) driving the coalitions' excesses 
to an a priori given cone. The resulting allocation rules are robust as they guarantee the 
aforementioned convergence properties despite the uncertain and time-varying nature of the 
coaltions' values. 



April 24, 2012 



DRAFT 



3 



In the context of coalitional TU games, robustness and dynamics naturally arise in all the 
situations where the coalitions values are uncertain and time-varying, see e.g., [7J. Robustness 
has to do with modeling coalitions' values as unknown entities and this is in spirit with some 
literature on stochastic coalitional games [|23l . [|24ll . However, we deviate from the latter works 
since the probability function generating the random coalitions values is unknown, and this is 
more in line with the concept of Unknown But Bounded (UBB) variables formalized in [[8]|. 
It is worth to mention that this formulation shares some common elements with the recent 
literature on interval valued games UJ, where the authors use intervals to describe coalitions 
values quite similar to what is done in this paper. The interval nature of coalitions' values 
arises generally due to the optimistic and pessimistic expectations of the coalitions [fTIll when 
cooperation is achieved from a strategic form game. We also note some differences in that 
we focus here more on the time-varying nature of the coalitions' values. In doing so, we also 
link the approach to the set invariance theory [lOJ and stochastic stability theory [20] which 
provides us some nice tools for stability analysis (see, e.g., the use of a Lyapunov function 
in the proof of Theorem 14.11) . 

Bringing dynamical aspects into the framework of coalitional TU games is an element in 
common with other papers lfT3l . [[T6ll . ifTTl . The main difference with those works is that 
the values of coalitions are realized exogenously and no relation exists between consecutive 
samples. 

Convergence conditions together with the idea that allocation rules use a measure of the 
extra reward that a coalition has received up to the current time by re-distributing the budget 
among the players are a main issue in a number of other papers [[T2l|. [[TSl . [[TSl . [[22| as 
well. However, this paper departs from the aforementioned ones mainly in that dynamics in 
those works is captured by a bargaining mechanism with fixed coalitions' values while we 
let the values be time-varying and uncertain. This last element adds some robustness to our 
allocation rule which has not been dealt with before. 

The main contribution of this paper is captured by the following three results. First, we 
design an allocation rule based on full observation of the extra reward so that the average 
allocation approaches a specific point in the core of the average game, while the coalitions' 
excesses converge to an a priori given direction. Second, we design a new allocation rule 
based on partial observation on the extra reward so that the average allocation converges 
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to the core of the average game, while the coalitions' excesses converge to an a priori 
given cone. Convergence of both allocation rules is proved via Lyapunov stochastic stability 
theory. And third, we establish connections of the Lyapunov stochastic stability theory to the 
approachability theory flU, [[IHl and attainability theory flU, [fT9l . 

A few other contributions of the paper are the definition of average game, whose role 
becomes fundamental when the coalitions' values variations are known with delay by the 
planner; the reformulation of the problem as a network flow control problem, where the 
allocation rule turns into a robust control policy is a novel aspect, with the importance of 
such a reformulation lying in the fact that we can prove the convergence of the allocations 
using the strong tools of the Lyapunov stochastic stability theory; and finally, the idea of 
turning a coalitional TU game set up into a control theoretic problem is a novel one, which 
represents, by far, the main characteristics of this work. 

The paper is organized as follows. In Section [III we formulate the problem. In Section Hill 
we present the basic idea of our solution approach. In Section |IV] we state the three main 
results of this work and postpone the derivation of such results to Section |Vl In Section |Vll 
we provide some numerical illustrations. Finally, in Section IVII[ we draw some concluding 
remarks. 

Notation. We view vectors as columns. For a vector x, we use Xi or [x], to denote its zth 
coordinate component. For two vectors x and y, we use x < y {x < y) to denote < yi 
(Xi < yi) for all coordinate indices i. We let denote the transpose of a vector x, and 
||x||„ denote its n-norm. For a matrix A, we use aij or [A]ij to denote its ijth entry. We use 
\aij\ to denote the absolute value of scalar a^j. Given two sets U and S, we write U C S 
to denote that f/ is a proper subset of S. We use \S\ for the cardinality of a given finite 
set S. Let $ be a closed and convex set in M™, we use P{y) to denote the projection of 
any point y E onto $ (closest point to y in $). We also denote by 9$ the boundary of 
$ and Uy the outward normal for any y E 9$. We use dist{y, $) to denote the euclidean 
distance between point y and set $. Given a set of players and a function i] : S ^-^ M. 
defined for each nonempty coalition S* C A^, we write < N,r] > to denote the transferable 
utility (TU) game with the players' set A^ and the characteristic function r]. We let r]s be 
the value r](S) of the characteristic function r] associated with a nonempty coalition 5* C 
A^. Given a TU game < N,r] >, we use C{7]) to denote the core of the game, C{ri) = 
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G M'^' J2ieN ~ Ylies -^i — "^s for nonempty S C . Also, IR+ denotes the 
set of nonnegative real numbers. Given a random vector ^ the notation E[^] denotes its 
expected value. Given a random process {f (t)} we denote by v{t) = f^v{T)dT, its integral 
and v{t) = ^ its average up to time t. 

II. Model and problem formulation 

In this section, we formulate the problem in its generic form and elaborate on the role of 
information. Let = {1, . . . , n} be a set of players and S C N the set of all (nonempty) 
coalitions arising among these players. Denote by m = 2" — 1 the number of possible 
coalitions. We assume that time is continuous and use t G M+ to index the time slots. 

We consider a dynamic TU game, denoted < A^, {v{t)} >, where {v{t)} is a continuous 
flow of characteristic functions. The flow {v{t)} describes a bounded mean ergodic process. 
By bounded we mean that given a bounded convex set V G and a probability function 
P G A(V), where A(V) is the set of probability functions on V, then for all t G M+ each 
random variable v(t) takes values in V G according to probability P as expressed in ([T]); 
by mean ergodic we mean that its expected value coincides with the long term average as in 
©: 

v{t) G V C M™, for aU t G M+ (1) 
E[t;(t)] = limr^^v{T), for aU t G R+. (2) 

Thus, in the dynamic TU game < A^, {v{t)} >, the players are involved in a sequence of 
instantaneous TU games whereby, at each time t, the instantaneous TU game is < N,v{t) > 
with v{t) G V for all t > 0. Further, we let vsit) denote the value assigned to a nonempty 
coalition S C N in the instantaneous game < N,v{t) >. 

With the dynamic game we associate a dynamic average game < N,{v{t)} > and an 
instantaneous average game at time t > 0, < N,v{t) >. 

The motivation of formalizing the above dynamic TU games is in that such games represent 
a stylized model of all those scenarios where the coalitions' values vary with time. 

We assume that the core of the average game is nonempty on the long run. We will see 
that without this assumption the problem under study has no solution. Thus, denote by Vnom 
the (long run) average coalitions' values, namely, Vnom '■= linii-i-oo ^^(^) and let C{vnom) be 
the core of the average game. 
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Assumption 1: (balancedness) The core of the average game is nonempty in the limit: 

We can view the above assumption as introducing some steady- state (average) conditions on 
a game scenario subject to instantaneous fluctuations. However, note that we do not make 
assumptions regarding the balancedness of the instantaneous games which is the case with 
IITI. Thus, the core of the instantaneous game can be empty at some time t. 

Given the above dynamic TU game, a central planner interacts continuously over time with 
the players by choosing the instantaneous allocations denoted by a{t) E M". We assume that 
the allocations are subject to the following budget constraints. 

Assumption 2: (bounded allocation) The instantaneous allocation is bounded within a 
hyperbox in 

a{t) G ^ := {a G M" : a™n <a< amax}, 

with a priori given lower and upper bounds flmm, o-max ^ IR"- 

As regards the information available a priori (before the game starts) to the central planner, 
we assume that he knows the nature of the process {v{t))} (bounded mean ergodic), the 
bounded set V and the long run average coalitions' values f„om- The latter is the same as 
saying that he knows the expected coalitions' values for all t E R+. On the other hand, he 
has no knowledge of the underlying probability function P. 
Assumption 3: (on available information) The planner knows Vnom- 
Beside this, during the game the central planner also observes the extra reward of the 
coalitions up to t and for all t E IR+. Given this, and in line with a number of other papers 
ll2l, lfT2l . [fTSl . [fTSl . J22l, our goal is to find allocation rules that use a measure of the extra 
reward that a coalition has received up to the current time by re-distributing the budget among 
the players. To do this, a first step is to define excesses for the coalitions. For any coalition 
5* C A^, we define excess (extra reward) at time t > as the excess at time t = plus the 
difference between the total integral reward, given to it, and the integral value of the coalition 
itself, i.e., 

es{t) = J2('iit) - Mt) + (^s{0). 

Furthermore, assuming without loss of generality £5(0) = 0, we say that S is in excess at 
time it > if the excess is nonnegative, i.e., J2i£S^i(^) — ^sit)- Let e{t) represent the vector 
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of coalitions' excesses, formally given as: 

We are interested in answering two main questions for this class of games. 

• Question 1: Are there allocation rules such that the average allocations converge? If 
yes, let us denote by the set where the average allocations converge to. Can we 
make it converge to the core of the average game ^ C{vnom)^ Can we guarantee the 
convergence to a specific point of the core, call it nominal allocation a„om, that we have 
a priori selected? 

• Question 2: Are there allocation rules such that the coalitions' excesses e{t) converge 
to an a priori given cone Sq, say for instance the nonnegative m-dimensional orthant 
M™, or any direction at for t > with fixed a E W^l 

To motivate the above questions think of a situation where the objective of the central 
planner is to maintain the stability of grand coalition in an average sense, while controlling 
the coalitions' excesses at each time t E IR+. 

We are now in the position of providing a formal and generic statement of the problem. 
Henceforth, we use the symbol w.p.l to mean "with probability one". 

Problem 2.1: Find an allocation rule f -.W^ ^ AeW, such that if a{t) = f (e(t)) then 
i) limt^ooa(i) e A ^ C{vnom) w.p.l, and ii) limt^ooe(t) G Sq C w.p.l. 
Observe that because of the random nature of the coalitions' values v{t), both the excesses 
e(t) and the allocations a(t) are random and as such we look at the convergence of d{t) w.p.l. 
Essentially, we require that the probability of d(t) converging in the limit to ^ C{vnom) 
is 1. Similarly for e{t) and Sq. This type of convergence is also known as almost sure 
convergence [|20ll . 

We will show that if the planner has full observation of e(t) at every time t then the above 
problem is solvable even under the very strict condition of = cbnom and T.q = at t > 
with fixed a. Conversely, if the planner has partial observation of e(t) in that he only knows 
the sign of each component of e(t), then the problem is still solvable but under the relaxed 
condition of = C{vnom) and Sq C W^. 
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A. Motivations 

Dynamic coalitional games capture coordination in a number of network flow applications. 
Network flows model flow of goods, materials, or other resources between different produc- 
tion/distribution sites JSl. We next provide a supply chain application that justifies the model 
under study. 

A single warehouse vq serves a number of retailers Vj, i = 1, . . . ,n, each one facing a 
demand di{t) unknown but bounded by pre-assigned values rf™™ G M and df^^^ G M at any 
time period t > 0. After demand di{t) has been realized, retailer Vj must choose to either 
fulfill the demand or not. The retailers do not hold any private inventory and, therefore, if they 
wish to fulfill their demands, they must reorder goods from the central warehouse. Retailers 
benefit from joint reorders as they may share the total transportation cost K (this cost could 
also be time and/or players dependent). In particular, if retailer Vj "plays" individually, the 
cost of reordering coincides with the full transportation cost K. Actually, when necessary a 
single truck will serve only him and get back to the warehouse. This is illustrated by the 
dashed cycles (vq, vg, vq), (vq, vg, vq), and (vq, viq, vq) in the network of Figure [U The cost 
of not reordering is the cost of the unfulfilled demand di{t). 



{vi,...,V4}, {v5,...,V7}, {vg}, {vg}, and {vio} re- tion {vi, . . . , vio}. 
spectively. 

Fig. 1. Example of a distribution network 

If two or more retailers "play" in a coalition, they agree on a joint decision ("everyone 
reorders" or "no one reorders"). The cost of reordering for the coalition also equals the total 
transportation cost that must be shared among the retailers. In this case, when necessary a 
single truck will serve all retailers in the coalition and get back to the warehouse. This is il- 
lustrated, with reference to coalition {vi, . . . , V4} by the dashed cycle (vq, V4, vi, V2, V3, vq) in 
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(b) One single truck (cycle) leaving vo and serving coali- 



9 



Figure [T(a)j A similar comment applies to the coalition {vs, vg, V7} and the cycle (vq, V5, vg, V7, vq) 
in Figure |l(a)[ The network topology in Figure |l(a)| describes the existing coalitions. This 
is clear if we look at the subgraph induced by the vertex-set {vi,...,vio} (all vertices 
except Vq) and observe that such a subgraph has 5 connected components, i.e., {vi, . . . , V4}, 
{v5,...,V7}, {vg}, {vg}, and {vio} and that each component corresponds to an existing 
coalition. The cost of not reordering is the sum of the unfulfilled demands of all retailers. 
How the players will share the cost is a part of the solution generated by the bargaining 
process. 

Conversely, the subgraph induced by {vi, . . . , vio} in Figure p^b)] has a single connected 
component which means that all retailers "play" in the grand coalition and as such one single 
truck (cycle) will leave vq and serve all of them before returning to vq. This is represented 
by the dashed cycle (vq, V4, . . . , vio) in the same figure. 

The cost scheme can be captured by a game with the set N = {vi, . . . , v„} of players 
where the cost of a nonempty coalition S* C is given by 



cs(t) = miniir,^rf,(t) [ 



Note that the bounds on the demand di{t) reflect into the bounds on the cost as follows: for 
all nonempty S C N and t >0, 



mm 



I J2 dr \ < cs{t) < min 1 K, dT"" \ ■ 0) 

To complete the derivation of the coalitions' values we need to compute the cost savings 
vs{t) of a coalition S as the difference between the sum of the costs of the coalitions of the 
individual players in S and the cost of the coalition itself, namely. 



ies 



Given the bound for cs{t) in Q, the value vs{t) is also bounded, as given: for any S C N 
and t > 0, 



vs 



(t) < J2 min {K, rf^^^"} - min \ K, ^ d"^^ i 
ieS I ies J 



Thus, the cost savings (value) of each coalition is bounded uniformly by a maximum value. 

Introducing time aspects into a static TU game opens the possibility for modeling aspects 
such as intertemporal transfers, patience and expectations of players/coalitions. A generic 
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dynamic coalitional game description should capture these features. In a repeated joint replen- 
ishment game as the one discussed above, allocation rules having the properties formalized 
in Problem 12.11 encourage patient retailers to "play" in the grand coalition, to coordinate 
their replenishment policies and therefore to reduce total transportation costs. We say patient 
retailers since condition i) in Problem 12.11 guarantees convergence to core on the long-run, 
i.e., in an average sense. Condition ii) has the meaning of bounding the excesses during the 
transient (before convergence occurs). 

III. Flow transformation based dynamics 

The basic idea of our solution approach is to recast the problem into a flow control one. 
To do this, consider the hyper-graph "H with vertex set V and edge set E as: 

U:={V,E}, \/ = {vi,...,v^}, E := {ei,...,e„}. 

Figure [21 depicts an example of hypergraph for a 3-player coalitional game. The vertex set V 
has one vertex per each coalition whereas the edge set E has one edge per each player. A 




Fig. 2. Hypergraph H. := {V, E} for a 3-player coalitional game. 

generic edge i is incident to a vertex Vj if the player i is in the coalition associated to v^ . 
So, incidence relations are described by matrix i?^whose rows are the characteristic vectors 
c"^ G M". We recall that the components of a characteristic vector cf = 1 if i G S* and cf = 
if i ^ S. The flow control reformulation arises naturally if we view allocation ai(t) as the 
flow on edge ej and the coalition value vs{t) of a generic coalition S as the demand in the 
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corresponding vertex Vj. In view of this, allocation in the core translates into over-satisfying 
the demand at the vertices. Specifically, 



with the last inequality satisfied with the equal sign due to the efficiency condition of the 
core, i.e, J2i=i'^ii^) — '^mi't), where Vm{t) denotes the mth component of v{t) and is equal 
to the grand coalition value viy(t). Now, since v{t) is unobservable by the planner at time t, 
we need to introduce some allocation error dynamics which accounts for the derivatives of 
the excesses. Since e(t) represents the coalition excess, we have: 



Note that the above differential equation admits a solution at least in the sense of Filippov 
lfT4l . From dH) and by averaging and taking the limit in ([5]), we can reformulate Problem l2.ll 
as a flow control problem where a controller wishes to drive the quantity Um4_j,oo "^^^'^"^^^ to 
the target set T, defined below, w.p.l (see, e.g.. Fig. [3]): 



a{t) e C{v{t)) ^ Buait) > v{t), 



(4) 



(5) 



r := {r G : r„ = 0, r, > 0, Vj = 1, . . . , m - 1}. 



Note, r, 



m 



due to efficiency of allocations. 



A 



m 




Fig. 3. Trajectory for iM-lM, 
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Remark 3.1: Driving the average allocations to a particular point a„om G C C{vnom) re- 
sults in reaching a specific point in the target set T. To see this, note that when limt_^oo a(^) = 
anom we have T 3 B-^anom — Vnom > due to the property of the core. Thus, we also have 
that lim^^oo '^^^''^^''"^ is driven to the point B-^^anom — Vnom G T. 

The inequality condition in (H]) is transformed into equality type by introducing, from standard 
LP techniques, m — 1 surplus variables (one per each coalition other than the grand coalition). 
This increases the dimension of the control space of the planner from m to n + m — 1 and 
the dynamics (|5]) can be rewritten as follows: 



x{t) = Bu{t) - v{t), v{t) E V (6) 



where B 



and captures 



B 



H 





(deviation from' 



pmxn+m— 1 



Variable x{t) represents the state of the system 



the balanced system, i.e., the system characterized by a„om and 



Vnom- We introduce the set of feasible controls as: 



U ■= {u{t) e M^+^-i : u{t) = [a^{t) s^{t)f, a{t) G A, s{t) > O} . (7) 

Toward the reformulation of the problem as a stochastic stabilizability one, we introduce the 
following preliminary result. 

Lemma 3.1: If the variable x{t) is asymptotically stable almost surely, i.e., ([8]) holds true, 
then the average allocations converge to the core of the average game w.p.l. as expressed by 
and the excesses converge to the cone w.p.l. as described in (flOl) : 

limx(t) = 0, w.p.l. (8) 

t—^oo 

lim a{t) G C{vnom), w.p.l (9) 



lim e(t) G R+, w.p.l. (10) 

t—^oo 



Proof: To see why ([8]) implies (|9l), observe that if limf_j,oo x{t) = w.p. 1 . then \imt-^oo 
w.p.l. and therefore, by integrating and dividing by t in ([61) also limt^oo Bu{t) — v{t) = 
w.p.l. The latter can be rewritten as limt^oo Bu{t) = Vnom w.p.l, and as from (|7]) s{t) = 
Bya{t) —v{t) > and Vnom is balanced by Assumption 2 then we conclude that \imt_y^ a{t) G 

C(f„om) w.p.l. 

To see why ([8]) implies (fTOl) . observe that if lim^^oo = w.p.l., from (|7]) and under 
the assumption x(0) = e(0) = 0, then lim^^^oo e(^) = limt^oo s{t) > and (flOl) is proved. ■ 
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It is worth noting that condition ^ is part of Problem 12.11 In other words when solving 
Problem l2.ll we always guarantee Q. If this is clear then, we can use the above lemma to 
rephrase Problem 12.11 In doing this we need to make a partial distinction between cases i) 
and ii). More specifically, case ii) where = C'(^nom) can be restated as follows: 

Find u(t) := (t){x(t)) G U such that lim x{t) = w.p.l. (11) 

Note that if we wish to reach a specific point a„om then the condition ^ is only necessary 
and the resulting problem is a stricter version of (fTT)) . 

IV. Main results 

In this section we present the three main results of this work. The first one relates to the 
case where the planner has full observation on x{t) in which case the average allocation 
can be driven to a specific point in the Core of the average game. The second result applies 
to the case where the planner has partial observation on x{t), and convergence to the Core 
can still be guaranteed but not to a specific point of the Core. The third result highlights 
connections of the implemented solution approach to the approachability principle BUl, ifTSl 
and attainability principle [jU, [fT9l . 

A. Full information case 

In this section, we solve Problem 12.11 with Aq = anom and Sq = at, t > with fixed a 
under the assumption that the planner has full observation of the excesses e(t) and therefore 
x{t) as well. We recall that inferring x{t) from e(t) is possible as the surplus s(t) is selected 
by the planner. As we have said before, the problem that we solve is a stricter version of (fTT)) . 
This version derives from augmenting the state of dynamics ^ as explained in the rest of this 
section. Before introducing the augmentation technique let us assume that the fluctuations of 
the coalitions' values around the mean Vnom are independent of the state x{t). We formalize 
this in the next assumption where we denote by Av(t) = v{t) — Vnom the above fluctuations. 

Assumption 4: The state x{t) and the coalitions' values fluctuations Af (t) are independent. 
Introducing the fluctuations At>(t) allows us to rewrite dynamics Q in a more convenient 
way. To do this, note first that, as u(t) = [a{t)'^ s(t)"^]"^ and from Bunom = ^nomj if a-nom 
is fixed then Snom G M™^^ and therefore also Unom = [o-Lm "^nom]^ are fixed. Let us denote 
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An(t) = u{t) — Unom- Dynamics Q can be rewritten as follows: 

X{t) = Bu{t) -V{t) = Bu(t) - {Vnom + {v{t) - f„om)) = Bu{t) - Vnom " ^v{t) 

= B {u{t) - Unom) - ^V{t) = BAu{t) - Av{t) 

We mentioned before that we will focus on a stricter version of (fTTI) . We do this by augmenting 
the state as shown next. First, denote by i?^ a generic pseudo inverse matrix of B and complete 
matrices B and B^^ with matrices C and F such that 



B 

C 



Then, building upon the new square matrix 



5t F 



(12) 



B 

C 



let us consider the augmented system 



(13) 



x{t) = BAu{t) - Av{t) 
m = CAu{t). 

Here we assume that v{t) is independent of y{t) as well. After integrating the above system 
(see (fT4l) . right) we define a new variable z{t) as follows: 



z{t) 



fit F 



xit) 




xit) 




B 










C 



z{t). 



(14) 



It turns out that to drive x{t) to zero w.p.l, and obtain Unom as average allocation on the 
long run, we can rely on a simple function 0(.), which depends on z{t). Before introducing 
this function, for future purposes observe that the dynamics for z{t) satisfies the first-order 
differential equation: 

xit) 

m 

B 



z{t) 



fit F 



fit F 



C 



Au{t) 



fit F 



Av{t) 




(15) 



= Au{t) - B^Av{t). 

Let Au™" and Am"*"^' be the minimal and maximal values of Au{t) for the following 
constraints to hold true: u{t) = Unom + Au{t) G U. Then, let us formally define 0(2; (t)) 
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vit) 



U{t) 



x{t) = BAu{t) - Av{t) 
yit) = CAuit) 




Fig. 4. Dynamical System 



as: 

4>{z{t)) := Unom + Au(t) G U, Au{t) = sa^l^u^^n^ Au"--^]{-z{t)), (16) 

where with satya,b\{C) we denote the saturated function that, given a generic vector ^ and 
lower and upper bounds a and h of same dimensions as ^, returns 

hi for all % > bi 
sat[a,b]{0 = { di for all i < ai 

for all i ai<ii<hi 

Now, taking the control u{t) = (j){z{t)), we obtain the dynamic system z{t) = B(j){z{t)) —v{t) 
as displayed in Fig. SI With the above preamble in mind, we are ready to state the following 
convergence property. 

Theorem 4.1: Using the controller (f)(z{t)), as in (fT6l) . we have limf_i.oo z{t) = w.p.l and 
therefore limt^oo u{t) = Unom- 

In the next corollary, we use the previous result to provide an answer to Problem 12.11 

Corollary 4.1: The state x{t) is driven to zero w.p.l as expressed in (fTTI) . the average 
allocation converges to the nominal allocation i.e., lim^^oo ^(^) = cinom, w.p.l and the excesses 
converge to the direction So = at with a = Snom, i-e-, linit->oo e(i) ^ Sq. 

Proof: This is a direct consequence of the result proved in the previous theorem. From 
(fT4l) . and [B'^ F] being a non singular matrix, we have \imt^^x(t) = w.p.l. From the 
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previous theorem we also have \imt^oou{t) = Unom- Since u{t) = [a^it) we have 

that limt^oo a(t) = anom and lim^oo e(i) = = Snomt- ■ 
To summarize, in the full information case, the controller u{t) defined by (fT6l) induces an 
allocation sequence a{t) such that the average a{t) converges to = «nom and the excesses 
approach Snomt- 

B. Partial information case 

In the previous section we observed that if the planner has full observation of the excesses 
and therefore of x{t) then he can design an allocation rule so that the average allocations 
are driven to a„om and the excesses approach Snomt- In this section, we solve Problem 12.11 
with ^0 = C{vnom) and under the assumption that the planner has partial observation of 
x{t). In particular, we assume that the planner observes the sign of x(t) for all t E IR+. An 
information structure based on the sign of x{t) has an oracle-based interpretation which we 
discuss in detail in Subsection IIV-B1[ 

Similarly to the previous section, suppose that we know a particular allocation anom in 
the core C{vnom), and let us study the convergence properties of the average allocations. 
In particular, using an allocation rule u{t) = 0(x(t)), we require that x(t) satisfying the 
dynamics x{t) = B(j){x{t)) — v{t), converge to zero in probability. In this section, we state 
the second main result of this work which provides a solution to Problem 12.11 with partial 
information structure. To do this, let us denote again by B'^ a generic pseudo inverse matrix 
of B and take a feasible allocation Unom such that 

BUfiom Vnom ■ Hm v(t), Unom ^ U' 

Also, for future purposes, define a function 0(.), which depends only on the sign of x{t), as 
follows: 

(l){sgn{x{t))) := Unom + Au(t) G U, Au{t) = -SB'^ sgn{x{t)). (17) 

Now, taking the control u{t) = (j){sgn{x{t))), we obtain the dynamic system x{t) = B(j){sgn{x(t))) — 
v{t) as displayed in Fig. [5l Now, we state the following convergence property. 

Theorem 4.2: Using the controller u{t) = (j){sgn{x{t))) as in (flTI) we have limf_^oo x{t) = 
w.p.l. 

Corollary 4.2: The average allocation converges to the core of the average game as in Q 
and the excesses e{t) converge to as in (fTOl) . 
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uit) 



x{t) = BAu{t) - Av{t) 



sgn(x(t)) 



0(sgn(a;(t))) 



Fig. 5. Dynamical System 



Proof: Direct consequence of Theorem 14.21 and Lemma 13. 1[ ■ 
1 ) Oracle-based interpretation: In this subsection we elaborate more on the partial infor- 
mation structure. In particular, we highlight how the feedback on state x{t) can be reviewed 
as the result of an oracle-based procedure. To see this, assume that the planner knows the 
sign of x{t). Since x{t) = (e(t) — s{t)) — (e(0) — a;(0)), sgn(x{t)) reflects over-satisfaction 
of coalitions with respect to the threshold s(t). In particular, take without loss of generality 
e(0),x(0) = 0, then with reference to component j, the sign of Xj{t) yields: 

1 e,it) > 5,(t) 
sgn{xj{t)):= i e,(t) = (18) 
-1 e,(t)<S,(t). 

V. 

To summarize, we can think of a situation where the planner approaches an oracle that tells 
him the sign of x{t). Since s(t) is chosen by the planner for every t, the accumulated surplus, 
s{t), is given as an input to the oracle. The oracle returns "yes" if the actual excess is greater 
than s{t) and "no" otherwise. The use of an oracle is an element in common with the ellipsoid 
method in optimization and with a large literature [|26l1 on cutting planes. 

Recall that nonnegativeness of the threshold has its roots in the feasibility condition u{t) E 
U for alH > with feasible set U as in 

Nonnegativeness of the threshold provides us with a further comment on the information 
available to the planner. Actually, from the first condition in (fTSl) . we can conclude that 
coalitions associated to a positive state x(t) are certainly in excess. This is clear if we observe 
that sgn{xj{t)) = 1 implies ej{t) > Sj{t) > 0. We can then summarize the information 
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content available to the planner as follows, where S is the generic coalition associated with 
component j: 



Trivially, the development in the full information case in Section IIV-A[ which is all based 
on control strategy (fT6l) . fits the case where x{t) is revealed completely. In this last case, the 
fact that the planner knows x{t) implies that he knows e{t) as well. Also, it is intuitive to 
infer that in this last set up, exact knowledge of x{t) can only influence positively the planner 
in terms of speed of convergence of allocations to the core of the average game. 

Remark 4.1: As the planner knows a priori the nominal game and a corresponding nominal 
allocation vector, a natural question that arises is why one has to design an allocation rule 
as given by (fT6l) and (flTl) instead of a stationary rule (/){■)= Unom- The rules given by (fT6l) 
and (flTI) intuitively translate to meeting the demands of coalitions in an average sense. This 
feature reflects patience aspect of coalitions in a dynamic setting, i.e., even if a demand is 
not met instantaneously a coalition is willing to wait and stay in the grand coalition as the 
demand is fulfilled in an average sense. 

C. Connections to Approachability and Attainability 

1) Approachability: Approachability theory was developed by Blackwell in 1956 L9J and 
is captured in the well known Blackwell's Theorem. Along the lines of Section 3.2 in [fTSl . 
we recall next the geometric (approachability) principle that lies behind Blackwell's Theorem. 
The goal of this section is to show that such a geometric principle shares striking similarities 
with the solution approach used in the previous sections. 

To introduce the approachability principle, let $ be a closed and convex set in R™ and let 
P{y) be the projection of any point y E (closest point to y in $). Also denote by y^ the 
average of yi . . . , y^, i.e., y^ = and let dist{yk, $) be the euclidean distance between 

point yk and set <J>. 

Lemma 4.1: (Approachability principle [fTSl ) Suppose that a sequence of uniformly bounded 
vectors yk in satisfies condition (fT9l ). 




1, nothing can be said. 



1 



then coalition S in excess 



[Vk - P{yk)r[yk+i - Pirn)] < 0, 



(19) 



then lim, 



dist{yk, $) = 0. 
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Now, to make use of the above principle in our set up, let us consider the discrete time 
analog of the excess dynamics 

Xk+i =Xk + BAuk - Avk, 

and define a new variable yu = Xk — x^-i so that we can look at the sequence of in R"*. 
Likewise, consider the discrete time version of control (fTTI) as displayed below: 

(j){sgn{xk)) := Unom + Auu e U, Auk = -5BUgn{xk - Xq). (20) 

We are now in a position to state the main result of this section. 

Theorem 4.3: Using the controller Uk = (f){sgn(xk — Xq)) as in (|20l) we have that 

i) the vector is approachable by the sequence y^, 

lim = 0, w.p.l, (21) 

k—^oo 

and therefore 

ii) the average allocations converge to the core of the average game, 

lim afc G C{Vnom), w.p.l. (22) 

The strength of the above result is in that it sheds light on how the convergence problem 
dealt with in this work has a stochastic stability interpretation as well as an approachability 
one. 

Remark 4.2: (Continuous-time approachability) We can reformulate Theorem 14.31 in the 
continuous time. To see this, let us first define y{t) := x{t). Next we need to derive the 
continuous time version of (fT9l ). To this aim, let t ^ r(t) he a differentiable continuous time 
variable and let z{t) = iMzlM^ tz{t) + z{t) = r{t). Discrete time versions are given as 
Zk = \'rk and Zk+i = ■^^^'"fc+i- The approachability principle is given as 

[zk - P{zk)Y [<P - P{^k)] < 
where (p = [k + Vjz^^i — kz^. In continuous time the above condition translates to 

[z{t)-p{z{t))f['p-p{<m<^ 

and = (t + At)z{t + At) - tz{t) = t {z{t + At) - z{t)) + Atz{t + At). We see that 
^ = ^ z{t+^2~z{t) ^ _|_ Further, as At ^ we have UmAt^o ti = ^^(^) + ^(^) = ^(^)- 
The approachability principle in continuous time can then be reproposed as 

[z{t)-P{z{t))f[r{t)-P{zm<(^. (23) 
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which constitutes the continuous time version of (fT9l) . If $ = {0} we have P{z{t)) = 
and z^{t)f{t) < 0. Now, taking r(t) = x{t) we see that z{t) is the average of y{t). Then 
condition (|23] ) guarantees that z{t) converges to zero as well as y{t). But this implies that 
limt_>oo ^'■^^"'^'■"^ = and therefore from Lemma 13.11 we arrive at Q which represents the 
continuous time version of (|22l) . 

2) Attainability: Attainability is a new notion developed in dU, |fT9l in the context of 
2-player continuous-time repeated games with vector payoffs. Attainability finds its roots in 
transportation networks, distribution networks, production networks applications. The main 
question is the following one: "Under what conditions a strategy for player 1 exists such that 
the cumulative payoff converges (in the lim sup sense) to a pre-assigned set (in the space of 
vector payoffs) independently of the strategy used by player 2". 

Attainability shares similarities with two main notions in robust control theory [|TOl. The first 
notion is called robust global attractiveness and refers to the property of a set to "attract" 
the state of the system under a proper control strategy and independently of the effects 
of the disturbance. The second notion is referred to as robustly controlled invariance and 
describes the property of a set to bound the state trajectory under a proper control strategy 
and independently of the effects of the disturbance. Both notions are used in the following 
formalization of the attainability principle. The principle is accompanied by a sketch of the 
proof but no formal proof is reported as attainability is the main focus of another paper 
and here it is just auxiliary to the solution of our main problem and also because the 
aforementioned two notions are well known in robust control theory. We refer the readers to 
ttlOl and 01, m for further details. 

Let $ be a closed and convex set in R"^ and consider a differentiable continuous-time 
variable t i— i- y(t) taking value in R™ for all t > 0. 

Lemma 4.2: (Attainability principle H, [fT9l) Suppose that the differentiable continuous- 
time variable t ^ y{t) satisfies conditions (|24l) -(l25l). 



then lim^^oo dist{y(t), $) = 0. 

Essentially, condition (|25l) is strictly related to the subtangentiality conditions as formulated 
by Nagumo in 1942 and surveyed in [fTOll . Such conditions are proven to characterize robustly 



[yit) - Piyit))f m - P{ym < 0, yit) ^ $ 
m - P{ym < 0, y{t) e 9$ 



(24) 



(25) 
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controlled invariant sets. We provide a geometric perspective on such a condition in Fig. |7(b)[ 
Consider a 2 player continuous-time repeated game and let y{t) be the cumulative payoff up 
to time t. Denote by Y the set of possible instantaneous vector payoffs, call them y{t), for a 
fixed strategy of player 1 and for varying strategy of player 2. Condition (|25l) is equivalent 
to F C H := {y E M™| ny{t)y{t) < 0} and guarantees that the cumulative payoff up to time 
t + dt {dt is the infinitesimal time interval) y{t + dt) does not quit $. 

As regards condition (|24l) . suppose without loss of generality that $ := {x G M™| V{x) < 
k} for a fixed scalar k. Condition (|24|) establishes that the set $ = {x G V{x) < k} 
for any scalar k satisfying k > k is a contractive set. By contractive set we mean that it is 
invariant and, whenever the state is on the boundary, the control can "push it towards the 



interior". This is illustrated in Fig. |7(a)[ Let Y and y{t) have the same meaning as before. 
Condition (EH) establishes that Y C H' := {y e [y{t) - P{y{t))fy{t) < 0} which 
implies that dist{y{t + dt), $) < dist(y(t), $) and therefore $ is robustly attractive. 





H+ 



(a) Robust global attractiveness: condition l l24b . (b) Robust control invariance: condition 

Fig. 6. Geometric representation of conditions and ( 125b . 



Based on the above lemma, we can rephrase Theorem 14.21 as follows. 
Theorem 4.4: Using the controller u{t) = (l){sgn{x{t))) as in (flTl) we have that the vector 
is attainable by x{t). 
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V. Derivation of the main results 

A. Proof of Theorem \4.1\ 

This proof is derived in the context of Lyapunov stochastic stability theory [|20ll . We start 
by observing that using u(t) = (p(z{t)) we have: 

z{t) = B^{z{t)) -v{t). (26) 

Consider a candidate Lyapunov function V{z{t)) = ^z'^{t)z{t). The idea is to show that 
E[K(z(t))] < for all t > 0. Actually, the theory establishes that if the last condition holds 
true, then V{z(t)) is a supermartingale and therefore by the martingale convergence theorem 
limt_>oo ^(^(^)) = w.p.l (almost surely). To see that E[\/(z(t))] < is true, observe that 
from (fT5l) we have 



E[vizm = E[z^mt)] 

= E[z''{t)Au{t)]-E[z^{t)B^Av{t)] 
= E[z'^{t)sat{-z{t))] < 0, 

where condition E[z^ {1)3"^ Av{t)] = is a direct consequence of the assumption that Av{t) 
is independent of x{t) and y{t). But the above condition implies that lim(_^oo ^(^(i!^)) = 
w.p.l and therefore also limi_^oo z(t) = w.p.l. So far we have proved the first part of the 
statement, i.e., that the dynamic system (|26|) converges to zero w.p.l. For the second part, 
after integrating dynamics (fT5l) . we have 



QAuir)-B^Avir)]dr ^ ^.^ z{t) - .(0) ^ ^ 

t—^oo t t—^oo t 

This last condition together with the assumption Vnom '■= ^^^t^oov{t) yields 

t—^oo t t—>oo t 

from which we can conclude limj^oo u{t) = limt_^oo -^o ^nom+AM(T)dr _ ^^^^ claimed in 
the statement. 

'stochastic stability involves time derivative of the expectation of V{x{t)). However, since V{.) is non-negative and 
smooth, the limit and expectation can be interchanged by using the dominated convergence theorem |27| . 
^If Av{t) is independent of x{t) and y{t) then CAv{t) is independent of z{t) = Ax{t) + By{t). 
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B. Proof of Theorem \4.2\ 

Consider a candidate Lyapunov function V{x{t)) = ^x'^{t)x{t). The idea is to show that 
E[\/(x(t))] < for all t > 0. For this to be true, it must be 

E[V{x{t))] = E[x^{t)x{t)] 

= E[x^{t)Bu{t)] - E[x'^{t)v{t)] 

= E[x^{t)Bunom] + E[x'^{t)BAu{t)] - E[x^(t)t;„oJ - E[x'^{t)Av{t)] 

^ . ' 

=0 

= E[x^{t)BAu{t)] < 0. 

where condition E[x^(t)At>(t)] = is a direct consequence of Assumption IH But the above 
condition E[x'^{t)BAu{t)] < is satisfied since BAu{t) = —Ssgn(x), which in turn implies 

E[x'^{t)BAu{t)] = E[-6\\x{t)\\i] < 0. 

Then we obtain that limt^oo V{x{t)) = w.p.l and therefore also limt^oo x{t) = w.p.l and 
this concludes the proof. 

C. Proof of Theorem \4.3\ 

We first prove that (|2TI) implies (|22l) . Invoking the discrete time reformulation of Lemma 13.1 [ 
we can infer that limfc^oo = w.p.l. implies limfc_j.oo Ofc G C{vnom), w.p.l. Observing 
that yk = then we can conclude that Muij^^^yj^ — w.p.l implies lim/j_j.ooafc G 

C(fnom), W.p.l. 

We now prove that using the controller = (f){sgn{xk)) as in (|20|) then (|2TI) holds true. To 
see this, let us invoke the approachability principle in Lemma HTTI and observe that a sufficient 
condition for approachability of yk to is ylyt+i < for all k. This is evident if we take set 
$ including only the zero vector, $ = {0}, and thus P{yk) = in (fT9l ). For the present case, 
using the definition of yt, condition yj^yk+i < would be ^ {xk — Xq)^ {xk+i — Xk) < 0, 
which implies {xk — XqY B Au^ — (xk —XQ^Avk < for all k. Taking the expectation, from 
Assumption m we know that E[{xk — XqY Avk\ = and so we can write 

E[{xk - xofBAuk - {xk - xo)'^Avk] = E[{xk - x^fBAuk] 

= E[{xk - Xo)^B{-6Bhgn{xk - Xq))] < 0. 

From the above condition we derive that y^yk+i < w.p.l for all k and this concludes our 
proof. 
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D. Proof of Theorem \4.4\ 

Let us invoke the attainability principle in Lemma |4~2l and observe that a sufficient condition 
for x{t) to attain w.p.l is that 

E[x^{t)x{t)] < 0, x{t) ^ (27) 
E[x{t)] = 0, x{t) = 0. (28) 

This is evident if we take set $ including only the zero vector, $ = {0}, and thus P{x{t)) = 
in (|24l) and (l25l) . Now, observe that condition (l27l) is equivalent to condition E,[V] < used 
in the proof of Theorem |4.2[ Condition (|28l) is also satisfied as ^(^^(O) = and this concludes 
our proof. 

VL Numerical illustrations 

Consider a 3 player coalitional TU game, so m = 7, with values of coalitions in the 
following intervals: 

vi{i})e[OA], vi{2})e[0A], t;({3}) g [0,4], 

t;({l,2})e [0,4], t;({l,3})e[0,6], 
^;({2,3})e [0,7], t;({l,2,3})G [0,12]. 

The convex set V is then a hyperbox characterized by the above intervals. From Assumption 
[3l the planner knows the long run average game, i.e., \imt^aov{t) = Vnom- Without loss of 
generality we take the balanced nominal game be as Vnom = [1 23456 10]^. In other words, 
during the simulations we randomize the instantaneous games v{t) G V so that it satisfies the 
average behavior given by: 

1 /•* 

hm - / v{T)dT = Vnom- (29) 
i^oo t Jq 

Next, we describe an algorithm to generate P G A(V) and therefore v{t) G V such that the 
above condition holds true. 

By construction, Vnom is in the relative interior of the convex hull generated by the columns 
of the matrix R. If an instance of the game v{t) is chosen as rj with probability pi from the pair 
{R,p), Assumption |3] is satisfied. For simulations we ran the algorithm 10 times to generate 10 
{R,p) pairs in V. Further, from each pair {R,p) we take 100,000 random selections (using 
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Algorithm 

Input: Set V and value Vnom- 

Output: Probability function P G A(V) to generate v(t) E V. 

1 : Initialize Generate m random points, G V C W^, i = 
1,2, •• • ,m, 

2 : Solve R.p = Vnom, with -R = [ri, r2, ■ ■ ■ r^], 

3 : If p > and l^p > 0, then go to (4) else go to (1), 

4 : Rescale R a.s R = [l^p] R and p as p = (^r^, 

5 : If Tj G V, i = 1, 2, ■ ■ ■ , m, then go to (6) else go to (1). 

6 : STOP 



Matlab randsrc function) to realize v{t). The step size is set to A = 0.05. The results 
are averaged over the 10 pairs. The nominal choice of allocations and surplus is taken as 
Unom = [2.5 3 4.5 1.5 1 1.5 1.5 2 1.5]^. It can be verified that Bunom = Vnom- 

Full information case: The saturation thresholds Au'"*" and Au™""* are chosen so as to 
ensure u{t) G U. This condition translates into Umm < Wnom + satjA 
Denote 1 as a vector with all entries equal to 1. For the instantaneous game a negative 
allocation/surplus is not allowed, so Umin > • 1. Further, an allocation/surplus greater than 
the value of grand coalition is not allowed, so t/max < ^nom(^) ■ 1- For the given game 
parameters, we see that the lower and upper thresholds for the saturation function are — 1 
and 5.5, respectively. Next, we present the performance results of the robust control law 
given by equation (fT6l) . From Theorem 14. 1[ limt-^^ z{t) converges to zero w.p.l and as 
a result limt_^oo converges to zero. Fig. |7(a)| illustrates this behavior for the first 



component of coalition {1,2}. Further, by Corollary 14. 1[ the same control law ensures that 
the average allocations converge to the nominal allocations in the long run, in other words 
Hmt_>oo'2(i) = a-nom and Fig. |7(b)| illustrates this behavior. 

Partial information case: The choice of 6 is crucial so as to ensure u{t) G U. This condition 
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(a) Plot of 



^{1.2}(0"3^{l,2}(0) 



• Mt)--L., 
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(b) Plot of limf_n3o a{t) - anon 



Fig. 7. Performance of the control law given by J16t . 



translates to Umin < Wnom + 6B'^sgn{x) < U^,,. We observe "Eil^l ^ {B^sgn{x)). < 
J2j 1^1 j\- A conservative estimate of 6 is obtained as Umin < Unom ± Sma.Xi{J2j l^jjl} < 
Umax- For m = 7, we have maxj{^^. = 2.11. For the instantaneous game a negative 

allocation/surplus is not allowed, so Umin > 0.1. Furthermore, an allocation/surplus greater 
than the value of grand coalition is not allowed, so Umax < Vnom{N).l. We chose 5 = 1, 
which satisfies the above stated requirements. Next, we present performance results of the 
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(a) Plot of 
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(b) Plot of limt^oo a{t) — anor, 



Fig. 8. Performance of the control law given by U7t . 
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robust control law given by equation (flTI) . From Theorem 14.21 x{t) converges to zero in 
probability with a specific choice of control law and as a result Mmt^oo converges 
to zero. Fig. |8(a)| illustrates this behavior for the first component of coalition {1, 2}. Further, 
by Corollary 14.21 the same control law ensures that the average allocations converge to the 
core C{vnom) and from equation (flTl) it is clear that the instantaneous allocations lie in a 
neighborhood of nominal allocations. As a result there is uncertainty in the convergence of 
average allocations towards nominal allocations on the long run and Fig. |8(b)| illustrates this 
behavior. 

VII. Conclusions 

In this paper we studied dynamic cooperative games where at each instant of time the 
value of each coalition of players is unknown but varies within a bounded polyhedron. 
With the assumption that the average value of each coalition in the long run is known with 
certainty, we presented robust allocations schemes, which converge to the core, under two 
informational settings. We proved the convergence of both allocation rules using Lyapunov 
stochastic stability theory. Furthermore, we established connections of Lyapunov stability 
theory to concepts of approachability and attainability. The control laws or allocation schemes 
are derived on the premise that the GD knows a priori, the nominal allocation vector. If this 
information is not available then the problem can be treated as a learning process where 
the GD is trying to learn the (balanced) nominal game from the instantaneous games. The 
allocation rules designed in this paper assure stability of the coalitions in average, and as a 
result capture patience and expectations of the players in an integral sense. The modeling 
aspects of generic dynamic coalitional games are open questions at this point of time. 
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